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Technical Field 

■y hn invpnti^n r'^lntrr t*^ nn infnrmntinn ri^^^^'^n i nc mnthrrrl - 

information processing apparatus for presenting a^seaxclir^eestirrin such a way that 
allows a user to easily see tlieres3ilt,JbyTT0Fforming a clustering process on a search 
result proyiiiedr-by'''a''ger^ search service, and to a storage medium for 

inc nn i nfnrmntinn nnrtin[:: prnnnnn nnftwnr n prngrrrm ji. 

Background Art 

The presence of a search service is important to searching for information 
15 desired by a user from >among a vast amount of information present on a network. 
^ For example, to searc^|^ Web page using the Internet, a user selects one service 

3 from among a plurahty of search services, and then inputs a keyword as a search 

request to obtain his desired information. The search service performs information 
searching in response to the input keyword, and presents the search result to the 
20 user. 

Information searched by the search service frequently becomes large in 
volume, and the user has difficulty finding the user's desired information from 
among the vast amount of information^ ^Sinoo - tho Web p - agcG arc ourrontV more ,^ 
■ and more inoroaoing , presenting a number of pieces of searched information in an 
25 easy-to-understand fashion to the user becomes a serious concern. 

Methods of presenting the searched information organized in an easy-to-see 
fashion to the user are currently becoming commercially available. For example, re- 
searching is performed using a keyword obtained from the results that have been 
searched using the keyword input by the user. The user thus narrows down the 
30 search so that a Web page desired by the user becomes easy to find. Specifically, a 
keyword characteristic of a set of search results of the search is extracted to find a 
set of information really desired by the user. 

""^^-^^^^^^^^^^ ^ a set of pieces of information having - a property of similarity from 

^^^^r Tvact amount of information io called a "cluetoring" , Tho cluotciing, which io a w o Hr -, 
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^ known toohniquo in the info r mation procoooing field, is wid e ly used to a u iL a laige - 
^ amount of documcuta ^ 



It is not an accepted practice to subject search results, provided by search 
services (general-purpose search services) widely used by common users, to the 
5 clustering process. As already discussed, typically, information is extracted in 
response to the input keyword, and the extracted information is then simply 
presented to the user in a simple list. The user is thus forced to perform a 
troublesome job of finding the user's desired information from numerous listed 
pieces of information. 

10 It is an object of the present invention to present searched information in an 

easy-to-see fashion to a user by clustering the search result provided by a general- 
purpose search service. 

Brief Description of the Drawings 

KKjr. 1 is a block diagram of a first embodiment of the present invention, 
15 ihowing rhe construction of an information sorting apparatus that performs a 
' clustering pns^cess on a search result provided by one search service. 

FIG. 2 s1k>ws a plurality of documents as a result of search results provided 
by a search service\ised in the first embodiment. 

FIG. 3 is a block anagram showing the construction of a clustering processing 
20 unit shown in FIG. 1. 

FIG. 4 is a flow diagrain diagrammatically showing the steps of a document 
sorting process in the first embodiment. 

FIG. 5 shows the content \f a feature table illustrating the relationship 
between a feature extracted from the title of each document and the document 
25 having the title containing the feature. 

FIG. 6 shows a sort result of each document based on the feature table shown 
in FIG. 5. 

FIG. 7 shows a clustering result of docilb^^aent titles based on the sort result 
shown in FIG. 6. 

30 FIG. 8 is a block diagram showing the constriiistion of an information sorting 

apparatus that clusters the search result provided 05^ a single selected search 
service. 

FIG. 9 is a block diagram showing the construction of a\information sorting 
apparatus that clusters the search results provided by a plurahty of search services. 
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FlU\ 10 is a block diagram showing a second embodiment of the present 
invention. \ 

FIG. 11. shows clustering results that have been obtained by clustering a 
plurality of documents resulting from the search by a search service. 

5 FIG. 12 is >a flow diagram diagrammatically showing information sorting 

process steps in accm:dance with the second embodiment of the present invention. 

FIG. 13 showsVesults that have been obtained by subjecting the clustering 
result shown FIG. 11 taa cluster order rearranging process. 

FIG. 14 shows tVe construction of a third embodiment of the present 
10 invention, \ 

FIG. 15 is a flow diagram diagrammatically showing information sorting 
process steps in accordance witsh the third embodiment of the present invention. 

FIG. 16 shows a clusteririg result shown in FIG. 11 and a summary table 
thereof. \ 

15 FIG. 17 shows a clustering result that has been obtained by clustering URL 

addresses and a summary table thereof. 

Disclosure of the Invention \ 

To achieve the above object, an information sorting method of the present 
invention includes a step of acquiring a plurality of search results searched by a 
20 search service through a clustering module, a step of performing a clustering 
process on the search results through tha clustering module, and a step of 
outputting the clustering result from the clusteMng module. 

The information sorting method may further include a step of converting, 
through a converter module, the search result searched by the search service into a 
25 format that is processed by the clustering module. \ 

The converter module is arranged correspondingly to each of a plurality of 
search services when the clustering process is perfomaed correspondingly to the 
plurality of search services. \ 

A search process may be performed using one search service selected from the 
30 plurality of search services and the clustering process may be performed on the 
search result searched by the selected search service. Search processes may be 
performed in parallel using at least two search services of the plurality of search 
services, respective search results may be collected, and the cliistering process may 
be performed on the collected search results. Search processes may be performed in 
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parallol ucing at loact two search sei vice& of the phiralitj^ of coarch ucrvi cu b, nndO^ 
clustering process may be individually performed on the search results. ^.^.-^^"'"^ 

When the clustering process is performed on the seaj:chrr^SuIt^ information to 
be clustered is at least one of the title of adiiewifient, a URL address, an update 
5 date, and a file size of an individiial^se^cfcn result. 

In the informatipii-'gorting method, the order of cluster of a clustering result 
may be rearj^Kged using a score indicating the degree of match between the 
clustejin:g^esult and a search request for each document and the clustering result 

10 The rearranging process of the cluster order may include a step of calculating 

the average of scores of the documents contained in each cluster to treat the average 
of each cluster as a cluster score, and a step of rearranging the cluster order using 
the cluster score. 

The rearranging process of the cluster order may include a step of 
15 determining the maximum value of the scores of the documents in each cluster to 
treat the maximum score of each cluster as the cluster score, and a step of 
rearranging the cluster order using the cluster score. 

The rearranging process of the cluster order may include a step of 
determining a score at a midway point or a substantially midway point in each 
20 cluster when the documents contained in each cluster are arranged in the order of 
magnitude of scores assigned thereto, to treat the score at the midway point or the 
substantially midway point as the cluster score, and a step of rearranging the 
cluster order using the cluster score. 

The cluster score determining step for rearranging the cluster order may be 
25 individually performed correspondingly to the plurality of search services when the 
clustering process is performed correspondingly to the search results provided by 
the plurality of search services. 

The clustering process may be performed based on a feature, wherein the 
title of each document is detected and a word characteristic of and contained in the 
30 title is extracted as the feature. 

The manner of outputting the clustering result with the cluster order 
rearranged may include a step of displaying the clusters in the order of the 
magnitude of scores from a high score to a low score and when there are clusters 
having the same cluster score, one of the clusters having a larger number of 
35 documents therewithin may be positioned higher in the cluster order. 
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The method may include a step of generating a clustering result summary 
table indicating the summary of the clustering results based on the clustering 
result, and a step of outputting the clustering result summary table together with 
the clustering result. 

5 The clustering result summary table may include a cluster name of each 

cluster which is obtained through the clustering process. 

The clustering result may be mutually linked with the clustering result 
summary table. When a cluster name portion of the clustering result summary 
table is designated, the corresponding cluster portion of the clustering result is 
10 displayed. When one cluster portion of the clustering result is designated, the 
clustering result summary table is displayed. 

When a cluster name portion of the clustering result summary table is 
designated to display the corresponding cluster portion of the clustering result, the 
head portion of an outline surrounding the cluster or the last line in the outline of 
15 the cluster present immediately prior to the first cluster is displayed on the top of a 
screen. 

When the one cluster portion of the clustering result is designated to display 
the clustering result summary table, the clustering result summary table is 
displayed with the head portion thereof appearing first on the screen. 

20 The arrangement order of clusters forming the clustering result summary 

table may with the arrangement order of the clusters in the clustering result. 

When the clustering result summary table is displayed, the manner of 
displaying the cluster names is changed in the clustering result summary table 
depending on the importance of each cluster in response to the clustering result. 

25 When a plurality of documents to be clustered are the ones which have been 

searched using a keyword input by a user, the manner of displaying the cluster 
names containing the kej^ord input by the user is different in the clustering result 
summary table from the other cluster names. 

'^j^^^ ^L^\' '^n» information sorting appaialus uf the piesenl invention iiuJjjilajfc^^ 
30 ^/2tustering module for acquiring a plurality of search results segxcfeed'lBy a search 
service, performing a clustering process on the segjcGhr^^esults, and outputting the 
clustering result. ^ 

The informatiorj..-sertnig apparatus may further include a converter module 
for convertijtg-tKe search result searched by the search service into a format that is 

35 prcM>frtg S pH hy th^ - Ahif^t p rinjir TnnHnln.^ 
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The information sorting apparatus may include a cluster order settir 
module which rearranges the order of cluster of a clustering result using a ^^ore 
indicating the degree of match between the clustering result and a search/^quest 
for each document and outputs the clustering result with the cluster opder thereof 
5 rearranged. 

The information sorting apparatus may further include summary table 
generator unit for generating a clustering result summary Jttshle indicating the 
summary of the clustering results based on the clustering^ result, and a display 
control unit for outputting the clustering result summary table together with the 
10 clustering result. 

A storage medium of the present invention /tores an information sorting 
software program in which a clustering module peafforms a clustering process on a 
plurality of search results that have been searched by a search service in response 
to a search request of a user, and outputs th^clustering result. The information 
15 sorting software program includes a step or acquiring the search result from the 
search service, a step of performing the clustering process on the acquired search 
result and a step of outputting the clustering result. 

The step of performing the clii;^ering process may be performed subsequent 
to a step of converting the search result searched by the search service into a format 
20 that is processed by the clusteringonodule. 

The information sorting /oftware program may include a step of rearranging 
the order of cluster of the cmstering result using a score indicating the degree of 
match between the clusteriiag result and a search request for each document and a 
step of outputting the clustering result with the cluster order thereof rearranged. 

25 The information sorting software program may include a step of generating a 

clustering result summary table indicating the summary of the clustering results 
based on the clustering result, and a step of outputting the clustering result 
summary table together with the clustering result. 

Best Mode for/Carrying out the Invention 

30 The embodiments of the present invention are now discussed. The discussion 

of the embodiments that follows not only covers the information sorting method and 
the information sorting apparatus but also the specific process content of the 
information sorting process software program of the present invention stored in the 
storager medium. 

35 / (First embodiment) 
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^IQ. 1 Jili o wo g firct ombodimont of tho prcoent invention, including, gn^oj^ 
■\ components thereof, a search service 1, a co nverter modi i]^-^ar"-all5[""a^lustering' 
module 3. Theconygxteii-medtritr^z and the clustering module 3 in combination 

jeerreSPOnd g irtfnvmQtinn nnrtinpn n - ppovnfn c ^ 

5 The search service 1 is a widely available one such as the Internet. When a 

user inputs a keyword as a search request, information searching is performed on 
Web pages, for example, in response to the input keyword. A search result provided 
by the search service 1 is output in a file form, and is transferred to the clustering 
module 3. There is typically available a plurality of search services 1, and output 
10 data formats become different from search service to search service. The converter 
module 2 is arranged to convert the file into a form that allows the file to be read 
regardless of whatever different search services might be employed. 

The clustering module 3 includes a cluster-indexing information extraction 
unit 31 which extracts information (i.e., information to be clustered), required for 
15 clustering, from a search result file content output from the search service 1 (a file 
content subsequent to a conversion by the search service 1), a morphological 
analysis unit 32 for performing a morphological analysis to the information 
extracted as the cluster-indexing information, and a clustering processing unit 33 
for performing a clustering process based on the morphological analysis result. 

20 The cluster-indexing information extraction unit 31 extracts the information 

to be clustered from the search result provided by the search engine 1 and converted 
by the converter module 2. Several pieces of information may be contemplated as 
the cluster-indexing information (as will be discussed later). In this embodiment, 
the title (the topic) of each of a plurality of documents extracted as the search result 

25 is extracted as the cluster-indexing information. For example, a pluraUty of 
documents Dl, D2, D7 are now obtained as the search results as shown in FIG. 
2, The documents Dl, D2,..., D7 respectively contain titles Tl, T2,...,T7, and bodies 
Al, A2,..., A7 corresponding thereto. 

In response to the search results, the cluster-indexing information extraction 
30 unit 31 analyzes the documents Dl, D2, D7, and detects the document titles 
thereof. The detection of the title by the cluster- indexing information extraction 
unit 31 is specifically carried out as below. 

In a first method, a document structural format defines one portion thereof 
reserved for a document title. That portion is treated as a title. In a second 
35 method, a document structural format defines one portion thereof reserved for the 
displaying of characters of size larger than standard size. That portion is treated as 
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a title. In a third method, sentences of a fixed number or words of a fixed number 
from the start of the document are extracted as a document title. Further, the first 
method, the second method, and the third method are successively performed in the 
following way. When the portion defined as a title is found in the first method, that 
portion is used as the title. Otherwise, the second method is performed. When 
there is a designation for the displaying of characters of size larger than the 
standard size, that portion is treated as a title. Otherwise, the third method is 
performed to extract the title. 

The morphological analysis unit 32 performs morphological analysis on the 
portions extracted as the title from the document extracted by the cluster-indexing 
information extraction unit 31. 

• Referring to FIQ. 3, the - clustering proc e ssing unit 33 includoe a . fcaturo-* ^ ? 
extractor 331, a feature table generator 332, a document sorter 333, a documem 
sort result memory 334, an output controller 335, and a display unit 336^/The 
feature extractor 331 extracts features from the result of the morphologicaVanalysis 
provided by the morphological analysis unit 32. / 

The feature table generator 332 generates a feature taWe indicating the 
relationship between the features extracted by the feature extractor 331 and the 
documents D1-D7. The feature table will specifically be discyssed later. 

The document sorter 333 references the abo^d-mentioned feature table, 
thereby grouping the documents D1-D7 into a tmirality of clusters from the 
standpoint of semantical similarity. Specifically, Irased on the features contained in 
the titles Tl, T2,..., T7 of the respective documents Dl, D2,..., D7, documents having 
the same feature in common are treated as/me group, thereby forming one cluster. 
The document sorter 333 may contain a ^monymous feature dictionary (not shown). 
To group the documents having the >^me feature in common into a cluster, the 
document sorter 333 may determinier a common feature referencing the synonymous 
feature dictionary for the preserw^e of any synonym. When there is a synonym, the 
document sorter 333 may include the corresponding document into the same 
cluster. y 

The document ysort result memory 334 stores the content sorted by the 
document sorter 333. The output controller 335 reads the content of the document 
sort result mein0ry 334, and displays the content on the display unit 336. 

The i^ormation sorting process steps of the present invention performed in 
the abovjg^referenced arrangement are now discussed. The information sorting 
j ^or c ^ff o t opy^f rhe present invention are roughly ahow n in a flow diagram . in FIG , 
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•Ar. — SpccifiGa% , el t j l ep of ac quiiiii^ a search lebull jLU ' OVidyd by a - genera l: 
search engine is performed (step SI), a step of performiiig-a-ehtgtermg process on 
the acquired search result is_pgj£i£med—(gtep''§^, and a step of outputting the 
clusteringj::BS«4t-tsfep"'§3) is performed. The information sorting process steps are 
gsEjd i s cussed in more detail, roforring to specific cxamp4 ea^ 

t 

The search service 1 outputs as the search result the documents Dl, D2,..., 
D7, which have been searched using a keyword input by a user, as shown in FIG. 2. 
The search result, output in a file form, is converted into a format which can be 
processed by the clustering module 3, and is then transferred to the clustering 
module 3. 

The cluster-indexing information extraction unit 31 extracts the titles from 
the documents Dl, D2,..., D7 input to the clustering module 3. For example, the 
title Tl is detected from the document Dl, the title T2 is detected from the 
document D2, the title T3 is detected from the document D3,... The titles Tl, T2,..., 
T7 are thus respectively detected from the documents Dl, D2,..., D7. 

The morphological analysis unit 32 performs morphological analysis to the 
titles Tl, T2,..., T7 and feeds the morphological analysis result to the clustering 
processing unit 33. In the clustering processing unit 33, the feature extractor 331 
extracts features present in the titles Tl, T2,..., T7 based on the morphological 
analysis result provided by the morphological analysis unit 32. 

The feature table generator 332 generates a feature table indicating the 
relationship between the features and the documents having the respective titles 
containing the features. FIG. 5 shows an example of the feature table. The feature 
table here lists the relationship between the features of three or more types 
extracted from the documents and the documents having the titles containing the 
features. Numbers hsted in the feature table indicate the number of a feature 
contained in the title of each document. For example, the number of the feature 
"sheet" contained in each of the titles Tl, T4, T6, and T7 of the documents Dl, D4, 
D6, and D7 is one. 

Referring to the feature table shown in FIG. 5, the documents Dl, D4, D6, 
and D7 contain the feature "sheet" in the titles thereof. The feature "cassette" is 
contained in the documents Dl, D4, and D7, and the feature "mounting" is 
contained in the documents D2, D3, D5, and D7. Returning to FIG. 2, the feature 
portion in the titles are underlined. 

^hc doGumont sorter 333 roforoncca ;:.uLh a feature tab l e to cluoter the - 
eaturos. The sort i ng result is - ahown in FTG .. fi .» 
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The sorting result is also store d in the ducumeiiL ^ort r esu lt memuxy 334. h» 
the document sort result shown in FIG. 6, reference is made to a cluster (contammg 
documents Dl, D4, D6, and D7) sorted according to the "sheet". As shown in JIG. 2, 
the document Dl relates to the sheet cassette, the document D4 relates tq/uie sheet 
5 setting, the document D6 covers the smearing of sheets through printing, and the 
document D7 relates to the mounting of the sheet cassette. / 

In this way, all documents Dl, D4, D6, and D7 relate to th^heet. There will 
be no problem if these documents are grouped in the same/cluster, and the sort 
result is deemed appropriate. >^ 

10 As for the clusters sorted according to the feature "cassette" (including 

documents Dl, D4, and D7), the document Dl rejmes to the sheet cassette, the 
document D4 relates to the sheet setting, and/the document D7 relates to the 
mounting of the sheet cassette, as described in/€ne documents shown in FIG. 2. 

All documents Dl, D4, D6, and D7 cover the setting of sheets. There will be 
15 no problem if these documents are groii^d in the same cluster, and the sort result 
is deemed appropriate. X 

Reference is made to the cl^ter (containing documents D2, D3, D5, and D7) 
sorted according to the featura/mounting". As shown in FIG. 2, the document D2 
relates to the mounting of arn expansion memory, the document D3 relates to the 
20 mounting of an interfacey<^ard, the document D5 relates to the mounting of a hard 
disk, and the document^7 relates to the mounting of a sheet cassette. 

All documenty^ D2, D3, D5, and D7 relate to the mounting of something. 
There will be no im)blem if these documents are grouped in the same cluster, and 
the sort result is jdeemed appropriate. 

25 The reason why such an appropriate sorting is performed is that the features 

are extracted^from the document titles, and that the documents are sorted according 
to the featiares. The writer of each document typically conveys the main point of 
the document in the title of the document. Sorting the documents using the 
features contained in the title of each document prevents the sort result from 

30 becoming discursive and lowers the possibihty of generating a noise cluster. Since 
the^ writer of each document conveys the main point of the document in the 
^cuniBxwt title, the oorting focuses on the viewpoint of the writer of tlr e^ocumeHtaw 

FIG. 7 shows the clustering result that is actually presented to the user. 
Listed in the table shown in FIG. 7 are the features and the titles containing 
35 respective features. Viewing the table of the clustering result, the user clicks a title 
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portion that could possibly contain information desired by the user, and the body of 
the document corresponding to the title then appears. 

As described above, in the present embodiment, the user utilizes a general- 
purpose search service, and enters a keyword to the search service. When the 
5 documents Dl, D2,..., D7 are found, the titles Tl, T2,..., T7 thereof are extracted, 
and the clustering process is then performed on the documents Dl, D2,..., D7 based 
on the features contained in the titles Tl, T2,..., T7. 

Conventionally, the search result has been a simple listing of search result 
provided by the search service. In contrast, the present embodiment presents the 
10 clustering result based on the title content of the search result. The clustering 
result (shown in FIG. 7), clustered according to each feature contained in the titles, 
is thus presented to the user in an easy-to-see fashion. 

' 2 If the user finds information of interest to him, the user simply clicks the title 

' J portion. The document having the title then appears. 

m 15 In the above discussion, the search result, provided by a single general- 

purpose search service, is clustered. Alternatively, the present invention is 
s J applicable to search results provided by a plurality of search services. 

g -. The search services can have their own specialty fields. For example, one 

fij search service has a large storage of sports-related information, another search 

l^f 20 service stores a great deal of academic field information, and another search service 



used, there are variations in the content, the size, and the output order of the 
search results provided by the search services. For this reason, the apparatus 
includes, correspondingly to the search services la, lb, and Ic, converter modules 
2a, 2b, and 2c for converting the files from the search services la, lb, and Ic into a 
35 format that can be handled by the clustering module 3. Since the construction of 



25 



stores a vast amount of show-business related information. The search services 
stores abundant information in their own specialty fields, and each user has a good 
chance of retrieving his desired information therefrom. In information searching, 
the search services are selectively used in view of search purposes. The clustering 
process using a plurality of search services is now discussed. 




When a plurality of search services (the search services la, lb, and Ic here) is 
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the clustering module 3 here remains unchanged from that of the counterpart 
shown in FIG. 1, Hke elements are identified with hke reference numerals. 

With the arrangement, a user can select the search service in view of the field 
of information the user desires. For example, the first search service la has an edge 
5 in sports. To search for sports-related information, information searching is 
performed using the first search service la. If the second search service lb has an 
edge in the academic field, the second search service lb is used to search for 
academic information. 

In this way, the user selects the search service depending on desired 
10 information. Further, the clustering module 3 performs the clustering process on 
the search result so that the search result is presented to users in an easy-to-see 
fashion. The clustering process has already been discussed. 

The selective use of the plurahty of search services in this way allows 
information searching to be performed in view of the advantage of the respective 
15 search service. The selective use of the plurality of search services also allows the 
search services to be switched from one to another when the one search service is 
busy. 

Searching may be performed in parallel by the plurality of search services, 
the search results provided by these search services may be then combined, and the 
20 clustering process may be performed onto the combined search result. FIG. 9 
simply shows an arrangement that operates in this way. 

Besides the arrangement shown in FIG. 8, the arrangement shown in FIG. 9 
additionally includes a search result collector 4, for combining individual search 
results, added between the converter modules 2a, 2b, and 2c correspondingly 
25 arranged for the first through third search services la, lb, and Ic and the clustering 
module 3. The rest of the construction remains unchanged from that shown in FIG. 
8, and hke elements are identified with like reference numerals. 

With this arrangement, the plurality of search services (the first through 
third search services la, lb, and Ic here) concurrently perform searching in parallel 

30 in response to a keyword input by a user, the converter modules 2a, 2b, and 2c 
correspondingly arranged for the search services la, lb, and Ic respectively convert 
the individual search results respectively provided by the search services la, lb, 
and Ic into a format that can be processed by the clustering module 3, and the 
search result collector 4 then receives and combines the converted individual search 

35 result files. Upon receiving the combined result, the clustering module 3 performs 
the above-referenced clustering process on the combined result. 
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The use of the pluraHty of search services for information searching acquires 
a wide variety of information, which could not be obtained using a single search 
service. Since the use of the plurality of search services widens the search area, 
information searching is performed in an exhaustive fashion. The user easily and 
5 efficiently comes to grips with what information associated with the keyword input 
by himself is globally present. The information thus obtained is subjected to the 
above-referenced clustering process and is presented to the user in an easy-to-see 
fashion. 

When the first through third search services la, lb, and Ic shown in FIG. 9 
10 are used for information searching, the clustering process may be performed on the 
individual search results provided by the search services la, lb, and Ic (i.e., the 
outputs of the converter modules 2a, 2b, and 2c), rather than on the combination of 
the search results provided by the search services la, lb, and Ic (i.e., the outputs of 
the converter modules 2a, 2b, and 2c), The individual clustering results may be 
15 then presented to the user. 

A number of pieces of information widely distributed is thus efficiently 
searched. The user may compare the individual clustering results derived from the 
individual search results provided by the first through third search services la, lb, 
and Ic and may learn the characteristics of each search service. 

20 This embodiment is not limited to the above discussion, and various 

modifications are possible within the scope of this embodiment. For example, in the 
above-referenced embodiment, the cluster-indexing information (the information to 
be clustered) is the title of the searched document. Besides the document title, the 
cluster-indexing information may be one of a URL address (excluding http://), an 

25 update date (a simple time or date/hour/time within the latest one month), and a 
file size (a byte size of the body of a Web page). These may be used solely or in 
combination. By selecting the cluster-indexing information, the apparatus performs 
clustering in a manner characteristic of the selected cluster-indexing information. 
What cluster-indexing information to select may be arranged as selection items 

30 from which the user initially selects in a menu. If any selected item is not present, 
another item may be used. For example, when a title is selected but no title is 
available on a corresponding Web page, a URL address may be used. 

^embodiment may be stored in storage me dia su ch-.as---a-flt7ppy"'3rsk, an optical disk, 
and a hard d isk. ^ u^h^sto^a-ge-TlIgdia^failwithin the scope of the present invention. 

^Tp^otfto r^o prnjTrnm Tnny Ja f^ Qpgnirprl tViT-(> ^i^ b a npfwo rk. 
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' (Second embodiment) — 

As discussed in connection with the first embodiment, the clu^d;ering method 
of extracting a feature from the title of a document is excellentr in terms of the 
5 amount of computation and process time and permits appropriate clustering. Since 
the amount of information to be clustered is relatively small/for the overall volume 
of each document, the entire document is not always propeny clustered. A title may 
not properly represent the content of the document, or a^i inharmonious title largely 
unrelated to the content of a document may be used. In such a case, clustering 
10 accuracy is substantially degraded with no good clu^ering result expected. 

The clustering method based on the extr^ted feature checks the frequency of 
occurrence of the feature, and then aujfeomatically sorts the documents for 
clustering. Since such a clustering pmcess does not parse the document, the 
resulting clusters (a set of documents/aerived through the clustering process) are 
15 not necessarily a set of documents h^ing semantic similarity. 

Even in such a case, in^mation sorting preferably presents a clustering 
result, satisfying the search r^uirement of the user. 

In this embodimeEft, the search result obtained from a general-purpose 
search service is subjecxed to the clustering process, and the cluster order of the 
20 clusters derived throi^h the clustering process is rearranged. The clustering result 
is thus presented ^the user in a manner that meets the search requirement of the 
user. 

The sec^Jnd embodiment of the present invention is now discussed. 

FIG./IO shows the construction of the second embodiment of the information 
25 sorting apparatus of the present invention. Referring to FIG. 10, there are shown, 
as major components, a search service 101, a converter module 102, a clustering 
module 103, and a cluster order rearranging module 104. The converter module 
102; the clustering module 103, and the cluster order rearranging module 104 in 
C)?mibination corresponds to the information sorting apparatus. In particular, this 
30 /embo<iij3aeH? riris cliai ' fcicl^i ' ized b^^ the cluster ordei leaiicinging mudul<j^t^4 s> 

The search service 101 is a widely available one such as the Internet. When 
a user inputs a keyword as a search request, information searching is performed on 
Web pages, for example, in response to the input keyword. A search result provided 
by the search service 101 is output in a file form, and is transferred to the clustering 
35 module 103. There is typically available a plurahty of search services, and output 
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data formats become different from search service to search service. The converter 
module 102 is arranged to convert the file into a form that allows the file to be read 
regardless of whatever different search services might be employed. 

The clustering module 103 performs the clustering process to the search 
5 result provided by the search service 101 (the file content converted by the 
converter module 102). In this embodiment, a title is extracted from a document, a 
word characteristic of and contained in the title is extracted as a feature, and the 
extracted feature is subjected to the clustering process. 

Specifically, a portion extracted as the title of the document is subjected to 
10 morphological analysis, and a characteristic word is extracted from the 
morphological analysis result as the feature. A feature table indicating the 
relationship between the feature and the document associated with the feature is 
generated. For example, the feature table associates the feature with the document 
corresponding thereto, thereby listing the number of features contained in the title 
15 of each document. For example, features such as "summary", "LP", "specifications", 
"device", "semiconductor", and "electronic" are extracted from documents. The 
feature table lists the number of each feature contained in the title of each 
document. 

A plurality of documents is now grouped into a plurality clusters, each having 
20 semantic similarity, based on the feature table. Specifically, based on the feature 
contained in the title of each document, documents having the feature in common 
are grouped as one set, i.e., one cluster. 

The clustering result shown in FIG. 11 is now output by the clustering 
module 103. Referring to FIG. 11, there is shown a table listing, as already 
25 discussed, the name of each cluster obtained through the clustering process (the 
cluster name here corresponds to the above-referenced feature), the title of each 
document belonging to the cluster, the number of documents contained in the 
cluster, and a number indicating a score of each title. 

The score is used as an objective measure indicating the degree of match 
30 between the input keyword and each document. The larger the score, the higher 
the degree of match of the document to the keyword. Since the score indicates the 
degree of match of the document to the keyword, the unit of score becomes different 
depending on the search service, for example in % or points. In this embodiment, 
the score is expressed in points. 

35 Referring to FIG. 11, the clustering results provided by the clustering module 

103 are arranged in the order of the count of documents contained in the cluster. As 
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already discussed, the summary cluster, the LP cluster, the specifications cluster, 
the device cluster, the semiconductor cluster, and the electronic cluster are 
arranged in the order from the top to the bottom of the table. 

The cluster order rearranging module 104 rearranges the display order of the 
5 clusters based on the clustering result provided by the clustering module 103. The 
detail of the rearrangement of the display order will be discussed later. 

The second embodiment of the present invention thus constructed is 
discussed. 

— ^^^ 3^^\ ;;^' FIG. 12 d i a gi ammalioally sliuws a flow diagram of th^ infoyn iKtioTr- 

ItP process steps of this embodiment. A search result seajxhed-by the Search service 
101 is acquired (step 12S1), the clusteriiig--pfocess is performed on the acquired 
search result (step 12S2X.jjadH^ieclustering result is output (step 12S3). The 
=^ cluster order ofjjae-'cttlstering result is rearranged (step 12S4), and the rearranged 

J clustering'fesult is output (step 12S5). The information sorting process is discussed 

15 ^;^^^ImQjpe ^Lail, referring Lu a - specific exam^ fei^ 

In this embodiment, the clustering process performed by the clustering 
module 103 extracts the title of each document from the documents searched by the 
search service 101, extracts the features from the titles, generates the feature table 
indicating the relationship between the extracted features and the documents 
20 corresponding to the extracted features, and groups the documents into a pluraUty 
of clusters according to semantic similarity based on the content of the feature 
table. In this embodiment, the user inputs the keyword "semiconductor" to the 
search service 101 as a search request, and the clustering module 103 clusters a 
number of obtained documents in response. FIG. 11 shows the clustering results. 

25 The clustering result, provided by the clustering module 103, is input to the 

cluster order rearranging module 104 for the following process. 

As for the clusters (the summary cluster, the LP cluster, the specifications 
cluster, the device cluster, the semiconductor cluster, and the electronic cluster) in 
the clustering result shown in FIG. 11, scores of the documents belonging to each 
30 corresponding cluster are averaged. In this case, the scores are summed in each 
cluster, and the sum is then divided by the number of documents contained in the 
cluster. A simple arithmetic average is thus determined. 

For example, in the summary cluster shown in FIG. 11, the sum of the scores 
in the cluster is 579 points, and the number of documents is 16 in the search result. 
35 The average score is thus approximately 36 points. The sum of the scores of the 
"LP" cluster is 450 points with the number of the documents therewithin being 16. 
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The average score of the LP cluster is approximately 28 points. Similarly, the sum 
of the "specifications" cluster is 413 points with the number of the documents being 
14. The average score of the specifications cluster is approximately 29 points. The 
sum of the scores of the "device" cluster is 849 points with the number of the 
5 documents being 9. The average score is approximately 94 points. The sum of the 
scores of the "semiconductor" cluster is 757 with the number of the documents being 
7. The average score is approximately 108 points. The sum of the "electronic" 
cluster is 349 with the number of the documents being 4. The average score is 
approximately 87 points. 

10 The average score thus calculated is referred to as the score of each cluster 

(the cluster score). The clusters are thus arranged in the order of magnitude of the 
score cluster from a high cluster score to a low cluster score. 

Specifically, the highest cluster score is 108 points of the semiconductor 
cluster, the second highest cluster score is 94 points of the device cluster, and the 
15 third highest cluster score is 87 points of the electronic cluster, followed by the 
summary cluster (36 points), the specifications cluster (29 points), and the LP 
cluster (28 points) in that order. 

The cluster score is calculated for each cluster, and the clusters are 
rearranged in the order of cluster score from a high score to a low score. 

20 FIG. 13 lists the rearranged clustering results in a table. Referring to FIG. 

13, the semiconductor cluster appears as the first group from the top of the table, 
the device cluster appears as the second group, and the electronic cluster appears as 
the third group, followed by the summary cluster, the specifications cluster, and the 
LP cluster in that order. In the clustering results shown in FIG. 13, a cluster 

25 containing a higher number of documents matching the keyword "semiconductor" 
input by the user comes at a higher order. 

The clustering result shown in FIG. 13 is now compared with the clustering 
result shown in FIG. 11. In the clustering result shown in FIG. 11, the summary 
cluster, the LP cluster, and the specifications cluster likely to be formed of the 

30 documents not related to the keyword "semiconductor" input by the user, appears in 
a high order in the listing while the semiconductor cluster, the device cluster, and 
the electronic clusters likely contain documents directly related to the keyword 
appears in a low order in the listing. Referring to FIG. 13, such an arrangement 
order is reversed. The clusters likely to contain documents closely related to the 

35 keyword appears in a high order in the Usting. 
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When there occurs a pluraUty of cluster scores, a cluster having a higher 
number of documents therewithin becomes higher in order than other clusters. 

The sum and the average of the scores in each cluster may be displayed as 
shown in FIG. 13 or may not be displayed. As discussed above, the order of the 
5 cluster is determined based on the score assigned to each of the documents in the 
cluster rather than by simply sequencing the clusters with the number of 
documents contained therewithin (the number of the documents collected in the 
cluster). The cluster order compatible with the keyword is thus obtained. 

The clustering result is presented to the user as shown in FIG. 13. The user 
10 views the table of the clustering results, and clicks the title portion of a document 
which is hkely to contain information desired by the user. A displaying process is 
performed to display the body of the document corresponding to the title. 

As discussed above, in the second embodiment of the present invention, a 
number of documents searched according to the keyword input by the user is 

15 subjected to the clustering process based on the feature contained in the titles of 
these documents, and the scores of the documents belonging to each cluster are then 
averaged on a cluster by cluster basis. The average score is treated as a cluster 
score. The cluster order is thus rearranged based on the cluster scores. 
Specifically, the clusters are sequenced in the order of cluster score from a high 

20 score to a low score. The clustering result is thus obtained as shown in FIG. 13. 

Since the cluster likely to contain information desired by the user is 
positioned on the top of the table, the user can search desired information with 
ease. 

In the above discussion, the clustering process is performed on the search 
25 result provided by a single general-purpose search service. This embodiment is 
applicable to the case in which the clustering process is performed on the search 
result provided by a plurality of search services. 

The search services have their own specialty fields. For example, one search 
service has a large storage of sports-related information, another search service 

30 stores a great deal of academic field information, and another search service stores 
a vast amount of show-business related information. The search services stores 
abundant information in their own specialty fields, and each user has a good chance 
of retrieving his desired information therefrom. In information searching, it is a 
widely accepted practice to selectively use the search services in view of search 

35 purposes. 
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When a plurality of search services is used, the content, the size, and the 
output order of the search results provided by the search services are varied. For 
this reason, the apparatus includes, correspondingly to the plurality of search 
services, converter modules 102 for converting the files from the search services into 
5 a format that can be handled by the clustering module 103. When the cluster order 
is rearranged in the clustering result, a process of determining the cluster score is 
performed correspondingly to each search service. 

For example, the cluster order rearrangement process of this embodiment 
requires several steps depending on the search service. When the width of score 
10 distribution is extremely wide (for example, the score ranges from a minimum of 2 
to a maximum of 1000), the logarithm of each score may be taken. When a 
document has an excessively small score (for example, one document has a score of 
2 or 3 while almost all other documents have scores on the order of several 
hundreds), that document is excluded from the clustering process. 

15 With the abihty to work with a plurahty of search services, the user selects 

the search service depending on the field of desired information. The selective use 
of the plurality of search services in this way allows information searching to be 
performed in view of the advantage of the respective search service. The selective 
use of the plurahty of search services also allows the search services to be switched 

20 from one to another in a flexible manner when the one search service is busy. 

The second embodiment of the present invention is not limited to the above 
discussion, and various modifications are possible within the scope of this 
embodiment. In the discussion of this embodiment, a simple arithmetic average of 
the scores of the documents contained in each cluster is used for the cluster score. 
25 The maximum score among the scores of the documents in each cluster may be used 
as the cluster score. Alternatively, the score of a document at a midway point of the 
score distribution of the documents may be used as the cluster score. 

The use of the maximum score in each cluster eliminates the need for a 
summation and a division in the determining of the cluster order, thereby reducing 

30 the amount of calculation. Even if there is a small number of documents having 
extremely small scores in the same cluster, the influence of such small scores is 
controlled. In the same manner as with the maximum score, the use of the median 
score in each cluster reduces the amount of calculation. Further, when the median 
score is used, the influence of extremely high scores and extremely low scores is 

35 controlled. 
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In the above discussion of the present embodiment, the title of the searched 
document is used as the cluster-indexing information (the information to be 
clustered). Besides the document title, the cluster-indexing information may be one 
of a URL address (excluding http://), an update date (a simple time or 
date/hour/time within the latest one month), and a file size (a byte size of the body 
of a Web page). These may be used solely or in combination. By selecting the 
cluster-indexing information, the apparatus performs clustering in a manner 
characteristic of the selected cluster-indexing information. What cluster-indexing 
information to select may be arranged as selection items from which the user 
initially selects in a menu. If any selected item is not present, another item may be 
used. For example, when a title is selected but no title is available on a 
corresponding Web page, a URL address may be used. 

^ A third ombodimont of the information sorting appaiatus of llnj prGomrt * 
invention is now discussed. 

When the number of clusters obtained through clustering is natso large in 
the information sorting process, learning all clustering results de^ not take much 
user's time. 

The number of clusters obtained through th^^lustering process becomes 
occasionally large up to several tens to several llundreds. In such a case, even 
merely viewing all clustering results requires a^r eat deal of attention. 

In the third embodiment of the jn;^ent invention, the clustering process is 
performed on the search result provid^ by a general-purpose search service, and a 
table for allowing the user to g)^ce at the summary of the clustering results 
obtained through the clusterjalg process is formed. In this w^y, the user can 
efficiently search for his de^ed information. 

The third embodiment is now discussed in detail. 

FIG. 14 diagrammatically shows the third embodiment of the present 
invention. Refei^^ng to FIG. 14, there are shown a search service 141, a converter 
module 142, a^lustering module 143, a clustering result summary table generator 
module (heireinafter referred to as a summary table generator module) 144, and a 
display ccmtrol module 145. The converter module 142, the clustering module 143, 
the summary table generator module 144, and the display control module 145 in 
comb^ation correspond to the information sorting apparatus. In pLarticular, the 
thi^mjar& bodiment i i s cfem ' acfeTTzed b^' Llie siimmcU'y Liiblu y uncrator m od u l p 14^ 
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The search service 141 is a widely available one such as the Internet. When 
a user inputs a keyword as a search request, information searching is performed on 
Web pages, for example, in response to the input keyword. A search result provided 
by the search service 141 is output in a file form, and is transferred to the clustering 
5 module 143. There is typically available a plurality of search services 141, and 
output data formats become different from search service to search service. The 
converter module 142 is arranged to convert the file into a form that allows the file 
to be read regardless of whatever different search services might be employed. 

The clustering module 143 performs the clustering process to the search 
10 result provided by the search service 141 (the file content converted by the 
converter module 142). In this embodiment, a title is extracted from a document, a 
word characteristic of and contained in the title is extracted as a feature, and the 
extracted feature is subjected to the clustering process. 

Specifically, a portion extracted as the title of the document is subjected to 
15 morphological analysis, and a characteristic word is extracted from the 
morphological analysis result as the feature. A feature table indicating the 
relationship between the feature and the document associated with the feature is 
generated. For example, the feature table associates the feature with the document 
corresponding thereto, thereby listing the number of features contained in the title 
20 of each document. For example, features such as "summary", "LP", "specifications", 
"device", "semiconductor", and "electronic" are extracted from documents. The 
feature table lists the number of each feature contained in the title of each 
document. 

A plurality of documents is now grouped into a plurality clusters, each having 
25 semantic similarity, based on the feature table. Specifically, based on the feature 
contained in the title of each document, documents having the feature in common 
are grouped as one set, i.e., one cluster. 

The clustering result discussed in connection with the second embodiment 
and shown in FIG. 11 is now output by the clustering module 143. Referring to 
30 FIG. 11, there is shown a table hsting, as already discussed, the name of each 
cluster obtained through the clustering process (the cluster name here corresponds 
to the above-referenced feature), the title of each document belonging to the cluster, 
the number of documents contained in the cluster, and a number indicating a score 
of each title. 
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The score is used as an objective measure indicating the degree of match 
between the input keyword and each document. The larger the score, the higher 
the degree of match of the document to the keyword. 

Referring to FIG. 11, the clustering results provided by the clustering module 
5 143 are arranged in the order of the count of documents contained in the cluster. As 
already discussed, the summary cluster, the LP cluster, the specifications cluster, 
the device cluster, the semiconductor cluster, and the electronic cluster are 
arranged in the order from the top to the bottom of the table. 

The summary table generator module 144 generates a clustering result 
0 summary table of the clustering result provided by the clustering module 143 (a 
summary table). 

The display control module 145 performs display control, thereby presenting 
the clustering result provided by the clustering module 143 and the summary table 
provided by the summary table generator module 144. In this embodiment, in 
5 addition to the displaying of the clustering result and the summary table, the 
display control module 145 performs display control to present a mutually hnked 
portion between the clustering result and the summary table and to present a 
cluster of concern to the user in a visibly distinct fashion. The display control will 
be specifically described later. 

- ^Thc information sorting prococG in tho third ombodimcuL thus coiisLi^UcLud - ^ ^ 
e present invention is discussed. FIG. 15 diagrammatically shows a flowdisr^am 
of the information sorting process steps of this embodiment. j^^-'Search result 
searched by the search service 1 is acquired (step 15Sl)^th€rmistering process is 
performed on the acquired search result (step 15S2)f'aTid the clustering result is 
25 output (step 15S3). A step of generating a supaitfary table is performed based on the 
clustering result (step 15S4), and thj&^enerated summary table is displayed 
together with the above-mentionpd^lustering result (step 15S5). To display the 
generated summary table together with the above-mentioned clustering result, the 
summary table may ba/^uperimposed on the clustering result on a screen. 
30 Alternatively, the sujHmary table and the clustering result are separately arranged 
so that the disnJ^ unit displays the summary table followed by the clustering 
result. WheH^he clustering result is large in volume, the user may scroll through 
the clustering result to successively see it. 

The information sorting process steps in the third embodiment of the present 
35 ijffvenJioB-ttre discussed, Teferring to a specif 
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In this embodiment, the clustering process performed by the clustering 
module 143 extracts the title of each document from the documents searched by the 
search service 141, extracts the features from the titles, generates the feature table 
indicating the relationship between the extracted features and the documents 
5 corresponding to the extracted features, and groups the documents into a pluraUty 
of clusters according to semantic similarity based on the content of the feature 
table. In this embodiment, for example, the user inputs the keyword 
"semiconductor" to the search service 141 as a search request, and the clustering 
module 143 clusters a number of obtained documents in response. FIG. 11 shows 
10 the clustering result. 

The clustering result, provided by the clustering module 143, is input to the 
summary table generator module 144 for the following process. 

A summary table based on cluster names (such as "summary", "LP", 
"specifications", "device", "semiconductor", and "electronic") is generated for the 
15 clusters (the summary cluster, the LP cluster, the specifications cluster, the device 
cluster, the semiconductor cluster, and the electronic cluster) in the clustering 
result shown in FIG. 11. The summary table is thus presented together with the 
clustering result. 

FIG. 16 shows a display example in which the summary table 1610 is 
20 presented together with the clustering result 1620. In the display example, the 
summary table 1610 is followed by the clustering result 1620. The number of 
clusters of the clustering result 1620 is as small as six. In practice, however, the 
number of clusters may be as many as several tens to several hundreds. To search 
for desired information, the user must view all clustering results. If all clustering 
25 results are merely presented, the user must perform a tedious job to find desired 
information. Glancing at the cluster names of the summary table, the user comes to 
grips with what cluster is contained in the clustering results, and which cluster 
possibly includes information desired by the user. 

The cluster names forming the summary table 1610 are respectively linked to 
30 the clustering results. Even when the clusters are too many for the clustering 
results to be displayed in one screen, the user simply clicks any desired cluster 
name in the summary table 1610 shown in FIG. 11, and the cluster portion, 
corresponding the desired cluster name, in the clustering result 1620 is 
immediately displayed. If the cluster name in the clustering results is clicked in 
35 this state, the display control will immediately return the screen back to the 
summary table. 
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The following functions are added to further promote ease of use of the 
apparatus in the display control. 

The arrangement order of the cluster names in the summary table agrees 
with the arrangement order of the clusters in the clustering results. In the 
5 clustering results shown in FIG. 11, the arrangement order of the clusters in the 
clustering result are determined by the number of documents. For example, the 
number of documents in the summary cluster is 16, the number of documents in the 
LP cluster is 16, the number of documents in the specifications cluster is 14, the 
number of documents in the device cluster is 9, the number of documents in the 
10 semiconductor cluster is 7, and the number of documents in the electronic cluster is 
4. Accordingly in the summary table, "summary", "LP", "specifications", "device", 
"semiconductor", and "electronic" are arranged from the left to the right as shown in 
FIG. 11. 

The clusters in the clustering results may be sequenced in order with the 
15 cluster scores, rather than with the number of documents. As already discussed, 
the clustering results shown in FIG. 11 hsts the titles of the documents, the number 
of documents in each cluster, and the score of each cluster in a table form. 

The score is a value assigned to each document when a search service has 
searched for information according to the search service's own method in response 
20 to an input keyword. The score is typically used as an objective measure indicating 
the degree of match between the input keyword and the document corresponding 
thereto. The search results provided by a general-purpose search service are 
typically associated with respective scores. 

Although the ways of calculating scores and the concepts behind the scores 
25 are different from search method to search method, it can be generally said that the 
larger the score, the more the document matches the kej^ord. 

It is therefore contemplated that the scores in each cluster are averaged on a 
cluster by cluster basis to arrange the clusters in the order of scores from a high 
score to a low score. When the clusters are arranged in the order of average scores 
30 from a high score to a low score in the clustering results, the cluster names in the 
generated summary table are also arranged in that cluster order. 

The average score of the summary cluster is approximately 36 points. (Since 
the score indicates the degree of match of the document to the keyword, the unit of 
score becomes different depending on the search service, for example in % or points. 
35 In this embodiment, the score is expressed in points.) The average score of the LP 
cluster is approximately 28 points. The average score of the specifications cluster is 
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approximately 29 points. The average score of the device cluster is approximately 
94 points. The average score of the semiconductor cluster is approximately 108 
points. The average score of the electronic cluster is approximately 86 points. The 
average scores thus calculated are the scores of the respective clusters (called the 
5 cluster scores). Specifically, the highest cluster score is 108 points of the 
semiconductor cluster, the second highest cluster score is 94 points of the device 
cluster, and the third highest cluster score is 87 points of the electronic cluster, 
followed by the summary cluster (36 points), the specifications cluster (29 points), 
and the LP cluster (28 points) in that order. 

10 When the clusters in the clustering results are rearranged in the order of 

cluster scores from a high score to a low score, the order of the cluster names in the 
summary table is also in the order of "semiconductor", "device", "summary", 
"specifications", and "LP". 

The sum and the average of the scores in each cluster may be displayed as 
15 shown in FIG. 16 or may not be displayed. 

Since the cluster order in the clustering results typically has some meaning, 
the order of the cluster names in the summary table is preferably set to agree with 
the cluster order in the clustering results. This arrangement advantageously helps 
the user search for desired information. The user typically views listed information 
20 from the top to the bottom on a screen. The user's purpose of finding desired 
information will be advantageously served if the arrangement order of the cluster 
names in the summary table agrees with the arrangement order of the clusters in 
the clustering results. 

When a cluster name in the summary table 1610 is clicked to display the 
25 cluster portion of the clustering result 1620 linked thereto, the outline that encloses 
the cluster (referred to as a cluster outline) is displayed with the top portion thereof 
appearing on a first line on a screen. If the cluster is displayed with the cluster 
name appearing on the first line on the screen, the cluster feature corresponding to 
the cluster name (the document titles contained in the cluster in FIG. 11) can be 
30 disadvantageously unshown on the screen. Specifically, when the "semiconductor" 
is clicked in the summary table 1610 with the semiconductor cluster in the 
clustering result 1620 not appearing, the display shifts to the semiconductor cluster 
portion of the clustering result 1620 immediately in succession to the clicking 
operation. Then, the cluster feature at the top line of the cluster (the document title 
35 called "157 SEMICONDUCTOR DIVISION ENVIRONMENTAL GUIDE LINE" 
occasionally remains unshown. 
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To avoid such a problem, the cluster outline is displayed with the top portion 
of the cluster outline appearing on the first line on the screen. The cluster feature 
on the top of the cluster outline is thus presented with certitude. 

To assure more certitude, the last line of a cluster immediately prior to the 
5 cluster of interest may be displayed on the first line on the screen. Specifically, 
when the cluster name "semiconductor" in the summary table 1610 is clicked in the 
above example, the semiconductor cluster in the clustering results will be displayed. 
In this case, the cluster feature on the last line in the device cluster immediately 
prior to the semiconductor cluster ("56 DEVICE - SEMICONDUCTOR - ASSP" in 
10 FIG. 11) may be displayed on the first line on the screen. With the cluster feature 
on the last line in the cluster immediately prior to the cluster of interest displayed 
on the first line on the screen, the cluster features in the cluster of interest are 
displayed with certitude. 

The cluster names displayed in the summary table 1610 may be presented in 
15 different sizes and different colors depending on the cluster content in the 
clustering result 1620. The cluster content of the clustering result specifically 
indicates the degree of importance of each specific cluster, such as the degree of 
match of the document to the keyword input by the user. The degree of importance 
of the cluster is determined by the number of documents contained in each cluster 
20 or the score of each cluster. As already discussed, the average of the scores in each 
cluster is calculated, and a cluster having the maximum average score has the 
highest degree of importance. The displaying manner of that cluster having the 
highest degree of importance is made different from that for the remaining cluster 
names in the summary table 1610. 

25 In the above example, the semiconductor cluster, from among the clustering 

results shown in FIG. 11, has the highest cluster score, the displaying manner of 
the cluster name "semiconductor" in the summary table, corresponding to the 
semiconductor cluster, is made different from the remaining cluster names. 
Specifically, the cluster name "semiconductor" may be displayed in a color different 

30 from that of the remaining cluster names in one embodiment. The outline enclosing 
the cluster name "semiconductor" may be made solider than the remaining outUnes 
in another embodiment. The area enclosed by the outline of the cluster name 
"semiconductor" is set to be larger than the areas of the other cluster names in yet 
another embodiment. The cluster name "semiconductor" may be blinked in still 

35 another embodiment. The cluster name "semiconductor" is thus presented to the 
user in a visibly distinct way. 
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Similarly, a cluster having a larger number of documents may be presented 
to the user in a visibly distinct way. A cluster having a higher score may be 
presented in a color different from that of the remaining clusters. A cluster having 
a larger number of documents may be presented with the area of the outhne thereof 
5 expanded. In this way, the clusters may be displayed with the displaying manner 
thereof changed depending on the cluster characteristics. Glancing at the summary 
table 1610, the user quickly learns characteristics shared by the clusters in 
common. 

The summary table 1610 is presented with the cluster name containing the 
10 keyword input by the user presented in a manner different from that for the other 
cluster names to notify the user of this. 

For example, the keyword input by the user is "semiconductor" in the 
clustering result shown in FIG. 11. Among the clusters in the clustering results, 
the semiconductor cluster is the very cluster containing the keyword. 

15 In the summary table 1610 derived from the clustering results 1620, the 

"semiconductor" portion is displayed in a manner different from that of the other 
portions. For example, the "semiconductor" portion may be blinked, differently 
colored, or both blinked and differently colored to catch the eye of the user. 
Typically, the user wants to find a cluster name identical to the keyword input the 

20 user. An arrangement may be made so that the user learns at a glance the cluster 
name identical to the keyword in the summary table 1610. Such an arrangement is 
convenient to the user in the finding of information desired by the user. 

When a cluster name is clicked referring to the display content with a portion 
of the clustering result 1620 presented on the screen, the display returns back to 
25 the summary table 1610. In this case, the summary table 1610 is preferably 
displayed with the head portion thereof appearing first on the screen. 

Although the summary table 1610 is displayed with the clustering results 
presented in a simple format, the size thereof can become very large. There can be 
times when a plurahty of summary tables are generated. In the discussion until 
30 now, the title of each document is used, the clustering process is performed on the 
titles, and the summary table is generated based on the clustering result obtained 
through the clustering process. The clustering process may be performed not only 
on the titles but also URL addresses (excluding http://). 

For example, using the URL addresses, the clustering process can be 
35 performed on a number of documents, which have been used to obtain the clustering 
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results shown in FIG. 11, and the summary table can be generated on the clustering 
results. 

FIG. 17 shows a clustering result 1730 which has been obtained by 
performing the clustering process on the same documents shown in FIG. 11 
5 according to the URL addresses, and a summary table 1740 generated based on the 
clustering result 1730. The cluster names obtained through the clustering process 
are URL addresses, including"www. epson.co.jp", "www.i-love-epson.ne.jp", and 
"other URL". The cluster names forming the summary table 1740 are URL 
addresses including "www.epson.co.jp", "www.i-love-epson.ne.jp", and "other URL". 

10 As discussed above, the clustering process can be performed in a variety of 

methods, and a plurality of summary tables is thus obtained in response to the 
clustering results. 

In this way, a plurality of summary tables is generated, or a single summary 
table having a high volume of data is generated. When the cluster name portion of 
15 the summary table is called with the clustering results being viewed on the screen, 
the head portion of a first one of the plurality of summary tables is displayed on a 
first line of the screen if the plurality of summary tables is generated. If the single 
large summary table is generated, the head portion of the summary table is 
displayed on the first line of the screen. 

20 For example, the above arrangement works in the situation where the user 

wants to return to the summary table to see the summary of the clustering result 
subsequent to viewing the clustering result. When the display shifts back to the 
cluster name in the summary table in this case, the user can be at a loss of which 
portion of the summary table is currently presented if a plurality of summary tables 

25 or a single summary table with a vast amount of data loaded is presented. With 
this arrangement, however, the user views the entire summary table with the 
summary table displayed with the head portion thereof appearing on the first on 
the screen. 

With a variety of functions added in this way as described above, the 
30 usefulness of the summary table is even more enhanced. 

In this embodiment, as discussed above, a vast amount of information is 
clustered, and the clustering results are displayed together with the summary table 
containing the summary of the clustering result in a manner that allows the user to 
see the clustering result at a glance. Even the clustering results become a vast 
35 amount of information, the user still can view all clustering results. This 
arrangement substantially helps the user to find his desired information. With the 
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variety of functions implemented in the summary table and the clustering results, 
the user can view which cluster has the highest degree of importance in response to 
the kejrword input by the user. When shifting from the summary table to the 
clustering results, or returning from the clustering results to the summary table, 
5 the head portion of each content is displayed at a proper location on the screen. 
Even when the clustering results are alternated with the summary table on the 
screen, the user enjoys an efficient and comfortable operation, free from partly 
hidden displaying of information or free from being at a loss where the information 
of interest to the user is. 

10 This embodiment is not limited to the above discussion, and various 

modifications are possible within the scope of this embodiment. For example, in the 
above-referenced embodiment, the cluster-indexing information (the information to 
be clustered) is the title of the searched document. The clustering process is 
performed using URL addresses (excluding http://), besides the titles. 

15 The clustering process also may be performed using an update date (a simple 

time or date/hour/time within the latest one month), and a file size (a byte size of 
the body of a Web page). These may be used solely or in combination. By selecting 
the cluster-indexing information, the apparatus performs clustering in a manner 
characteristic of the selected cluster-indexing information. The summary table is 

20 generated based on the respective clustering results. 

In the above discussion, search results provided by a single general-purpose 
search service are subjected to the clustering process. The present invention is 
applicable to the case in which the search results provided by a plurality of search 
results are subjected to the clustering process. The clustering process is performed 
25 to the individual search results by the search services, and the clustering results 
are used to generate the summary table. 

In this embodiment, as described above, the plurality of searched documents 
are subjected to the clustering process, the summary table for allowing the user to 
glance at the summary of the clustering results obtained through the clustering 

30 process is formed, and the summary table is displayed together with the clustering 
results. Even the clustering results become a vast amount of information, the user 
still can roughly come to grips with the content of all clustering results. This 
arrangement substantially helps the user to find his desired information. Since the 
user can roughly come to grips with the content of all clustering results, the user 

35 can find not only information desired by the user in an efficient manner but also 
unexpected information. The user can thus learn new information with ease. 
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By mutually linking the clustering results to the clustering result summary 
table, the user can easily shift from the clustering result summary screen to the 
corresponding cluster portion of the clustering results. The user can also return 
back to the summary table. Even if the clustering results have a vast amount data, 
5 the user reaches a cluster that is likely to contains desired information by repeating 
an alternation process therebetween. The user can thus find information desired by 
the user in an efficient manner. 

In the displaying of the clustering result summary table, the displaying 
manner of the cluster name in the summary table may be made different depending 
10 on the degree of importance. The displaying manner of the cluster containing the 
keyword input by the user may be made different from the other clusters. In this 
way, the user can quickly estimate where to locate the desired information by 
glancing at the clustering result summary table. The user can thus efficiently find 
his desired information. 
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